Journal of Open Source Software — Latest Matching Preprints

1

Ma, E.; Kummer, A.

2020-05-13 bioengineering 10.1101/2020.05.11.088344 medRxiv

Top 0.1%

60.6%

Show abstract

UniRep is a recurrent neural network model trained on 24 million protein sequences, and has shown utility in protein engineering. The original model, however, has rough spots in its implementation, and a convenient API is not available for certain tasks. To rectify this, we reimplemented the model in JAX/NumPy, achieving near-100X speedups in forward pass performance, and implemented a convenient API for specialized tasks. In this article, we wish to document our model reimplementation process with the goal of educating others interested in learning how to dissect a deep learning model, and engineer it for robustness and ease of use.

2

Internet of Things Architecture for High Throughput Biology

Parks, D. F.; Voitiuk, K.; Geng, J.; Elliott, M. A. T.; Keefe, M. G.; Jung, E. A.; Robbins, A.; Baudin, P. V.; Ly, V. T.; Hawthorne, N.; Yong, D.; Sanso, S. E.; Rezaee, N.; Sevetson, J. L.; Seiler, S. T.; Currie, R.; Hengen, K. B.; Nowakowski, T. J.; Salama, S. R.; Teodorescu, M.; Haussler, D.

2021-08-01 bioengineering 10.1101/2021.07.29.453595 medRxiv

Top 0.1%

39.7%

Show abstract

The Internet of Things (IoT) provides a simple framework to easily control online devices. IoT is now a commonplace tool used by technology companies, but it is rarely used in biology experiments. IoT can benefit research through alarm notifications, automation, and the real-time monitoring of experiments. We developed and implemented an IoT architecture to control biological devices used in experiments. We developed our own electrophysiology, microscopy, and microfluidic devices so that may be controlled through a unified IoT architecture. The system allows each device to be monitored and controlled through an online web tool. We present our IoT architecture so other labs may replicate it for their own experiments.

3

Methods for simulations with thousands of interacting objects to model mass transport in the brain

Elbert, D. L.; Esguerra, D. P.

2025-04-08 bioengineering 10.1101/2025.04.03.647052 medRxiv

Top 0.1%

39.5%

Show abstract

Clearance of toxic species that may cause neurodegenerative diseases relies on convection and diffusion of mass around cells in the central nervous system. In this manuscript, methods that allow the nanoscale modeling of mass transport around cells in the brain at a resolution of 8 nm are described. The cells are modeled as surface meshes and a parallelization scheme is used to solve the diffusion equation directly on the surface mesh of each biological cell independently. This is followed by mass exchange between biological cells that are in direct contact. The analysis volume size is fixed at 4 m x 4 m x 4 m but arrays of analysis volumes of arbitrary size may be analyzed, with fluxes across analysis volume boundaries updated at each time step by a semi-implicit formulation. Setup of the discretized equations is described, along with the face matching that allows mass transfer between cells. Parallelization is via a manager/worker framework. One-sided RMA communication in MPI is used to coordinate the efforts of multiple workers, with common information handled by the manager. This specialized framework is suitable for analyzing diffusional transport around thousands of interacting objects simultaneously. The framework is implemented with custom code developed in Julia, which is used here as a rapid prototyping language. Using model geometries, the accuracy of the discretization methods are demonstrated, with limitations and next steps described. Authors summaryThe methods that enable modeling diffusional mass transport around thousands of objects in parallel are described. The accuracy of the method with model objects is demonstrated.

4

inSequio: A Programmable 3D CAD Application for Designing DNA Nanostructures

LaRock, C.; Sorensen, P.; Blair, D.; Murphy, D.; O'Connor, J.; Armentrout, S.

2024-03-30 bioengineering 10.1101/2024.03.27.586810 medRxiv

Top 0.1%

34.4%

Show abstract

DNA nanotechnology is evolving rapidly, paralleling the historic trajectory of the 1970s electronics industry. However, current DNA nanostructure (DN) design software limits users to either manual design with minimal automation or a constrained range of automated designs. inS[e]quio Design Studio, developed by Parabon(R) NanoLabs, bridges this gap as a programmable 3D computer-aided design (CAD) application, integrating a domain-specific graphical editor with a Python API for versatile DN design. Developed in C++ for Windows(R) and Macintosh(R) systems, inS[e]quio features a user-friendly GUI with extensive CAD tools, capable of managing complex designs and offloading computational tasks to the cloud. It supports various DNA design formats, PDB molecule integration, residue modifications, and includes preloaded designs and thorough documentation. With its combination of features, inS[e]quio enables a code-centric design (CCD) approach, enhancing DN construction with improved precision, scalability, and efficiency. This approach is elucidated through a streptavidin barrel cage designed via Python notebook and a spheroid origami case study. Marking a significant advance in DN design automation, inS[e]quio, the first fully programmable 3D CAD tool for DN design, enables both manual and programmatic 3D editing. This fusion of features establishes inS[e]quio as a transformative tool, poised to significantly enhance designer productivity and expand the scope of possible designs. Extended AbstractAdvances in DNA nanotechnology have positioned the field at a juncture reminiscent of the pivotal growth phase of the electronics industry in the 1970s. The evolution of software for designing DNA nanostructures (DNs) is following a similar historical trajectory and dozens of software packages have been developed for creating them. Existing software options, however, require users to choose between manual design with minimal automation support or selecting from a limited set of designs, typically wireframe, that can be generated from a high-level structural description. Here, we introduce the inS[e]quio Design Studio, a programmable 3D computer-aided design (CAD) application that effectively bridges this gap. By integrating a domain-specific, freeform graphical editor with a Python application programming interface (API), inS[e]quio provides a comprehensive and extensible platform for designing complex nucleic acid (NA) nanostructures. The inS[e]quio desktop application, developed in C++, runs on Windows(R) and Macintosh(R) operating systems. Its graphical user interface (GUI) features multiple synchronized view panels and a diverse set of CAD and NA-specific editing tools. Its optimized graphics pipeline enables editing of designs with >2M nucleotides, and it includes an integrated service infrastructure for offloading heavy computations to cloud servers. The software also supports import and export of various DNA design file formats, integration of arbitrary PDB molecules, and specification of residue modifications. Additionally, it includes preloaded sample designs, scripts, and comprehensive documentation. Parabon has used evolving versions of inS[e]quio for over a decade to design a variety of proprietary DNs and have now transitioned it into a commercially available product. This paper summarizes inS[e]quios features, discusses its strengths and limitations, and outlines planned enhancements. Although freeform 3D design is well supported in inS[e]quio, the integration of its CAD environment with its API facilitates a code-centric design (CCD) approach for DN construction that offers notable productivity advantages over traditional methods, including enhanced precision, scalability, and efficiency. Here we describe CCD, outline its benefits and demonstrate its use through a well-documented Python notebook, included with the product, which generates a sample design within the inS[e]quio application. A spheroid origami created using CCD is also presented. As the first commercial fully programmable 3D CAD application specifically created for DN design, the release of inS[e]quio represents a milestone in the field of DN design automation. It introduces a new dimension to the discipline by enabling both manual and programmatic 3D editing, thereby facilitating an innovative CCD approach. The availability of extensive documentation and technical support enables designers to efficiently adopt and utilize these capabilities. This combination of features establishes inS[e]quio as a noteworthy addition to the tools available for DN design, with the potential to significantly increase designer productivity and broaden the scope of designs that can be developed by practitioners of all skill levels. Windows and Mac versions of the inS[e]quio desktop application are available for download at https://parabon.com/insequio. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=119 SRC="FIGDIR/small/586810v1_ufig1.gif" ALT="Figure 1"> View larger version (64K): org.highwire.dtl.DTLVardef@16a2714org.highwire.dtl.DTLVardef@2bc610org.highwire.dtl.DTLVardef@1d852e2org.highwire.dtl.DTLVardef@129b3fa_HPS_FORMAT_FIGEXP M_FIG C_FIG An illustration of the inS[e]quio Design Studio desktop application interoperating with a Python Jupyter notebook and molecular dynamics (MD) simulation tools to support an iterative code-centric design (CCD) process. The design cycle includes (a) programmatic and/or manual creation of objects in the inS[e]quio editors; (b) visual inspection and manipulation of objects via user interface; (c) in silico evaluation of designs via MD simulation using native or external tools; repeating a-c as necessary; and (d) procurement of strands and synthesis of DNA nanostructures (DNs).

5

Variant Mapping Application: Customize And Annotate Figures Of Voltage-Gated Sodium Channel

Wen, W.; Mahadevan, A.; Parrish, R. R.; Dean, R.; Cutts, A.; Johnson, J. P.

2023-12-13 neuroscience 10.1101/2023.12.09.570948 medRxiv

Top 0.1%

30.7%

Show abstract

Voltage-gated sodium ion channels allow for the initiation and transmission of action potentials. There is a high interest in research and drug development to selectively target these ion channels to treat epilepsy and other disorders such as pain. Scientific literature and presentations often incorporate maps of these integral membrane proteins with markers indicating gene mutations to highlight genotype/phenotype correlations. There is a need for automated tools to create high quality figures with mutation (variant) locations displayed on these channel maps. This manuscript introduces a simple application to create visualization for mutations on alpha voltage-gated sodium channels, created using the D3.js library. The application allows for mapping of variant sequences, as well as important properties like the type of variant and the phenotypes linked to the variant. It also allows for customizability and the production of high-quality images for publication. This application and code base can further be extrapolated to other ion channels as well.

6

TRaP: An Open-source, Reproducible Framework for Raman Spectral Preprocessing across Heterogeneous Systems

Zhu, Y.; Lionts, M. M.; Haugen, E.; Walter, A. B.; Voss, T. R.; Grow, G. R.; Liao, R.; McKee, M. E.; Locke, A.; Hiremath, G.; Mahadevan-Jansen, A.; Huo, Y.

2026-03-27 bioengineering 10.64898/2026.03.26.714582 medRxiv

Top 0.1%

19.6%

Show abstract

Raman spectroscopy offers a uniquely rich window into molecular structure and composition, making it a powerful tool across fields ranging from materials science to biology. However, the reproducibility of Raman data analysis remains a fundamental bottleneck. In practice, transforming raw spectra into meaningful results is far from standardized: workflows are often complex, fragmented, and implemented through highly customized, case-specific code. This challenge is compounded by the lack of unified open-source pipelines and the diversity of acquisition systems, each introducing its own file formats, calibration schemes, and correction requirements. Consequently, researchers must frequently rely on manual, ad hoc reconciliation of processing steps. To address this gap, we introduce TRaP (Toolbox for Reproducible Raman Processing), an open-source, GUI-based Python toolkit designed to bring reproducibility, transparency, and portability to Raman spectral analysis. TRaP unifies the entire preprocessing-to-analysis pipeline within a single, coherent framework that operates consistently across heterogeneous instrument platforms (e.g., Cart, Portable, Renishaw, and MANTIS). Central to its design is the concept of fully shareable, declarative workflows: users can encode complete processing pipelines into a single configuration file (e.g., JSON), enabling others to reproduce results instantly without reimplementing code or reverse-engineering undocumented steps. Beyond convenience, TRaP integrates configuration management, X-axis calibration, spectral response correction, interactive processing, and batch execution into a workflow-driven architecture that enforces deterministic, repeatable operations. Every transformation is explicitly recorded, making the full processing history transparent, inspectable, and reproducible. This eliminates ambiguity in how results are generated and ensures that identical protocols can be applied consistently across datasets and experimental contexts. Through representative use cases, we show that TRaP enables seamless, reproducible preprocessing of Raman spectra acquired from diverse platforms within a unified environment. We hope TRaP can empower Raman data processing as a reproducible, shareable, and systematized scientific practice, aligning it with modern standards for computational research. TRaP is released as an open-source software at https://github.com/hrlblab/TRaP

7

Cramming Protein Language Model Training in 24 GPU Hours

Frey, N. C.; Joren, T.; Ismail, A.; Goodman, A.; Bonneau, R.; Cho, K.; Gligorijevic, V.

2024-05-15 bioengineering 10.1101/2024.05.14.594108 medRxiv

Top 0.1%

19.2%

Show abstract

Protein language models (pLMs) are ubiquitous across biological machine learning research, but state-of-the-art models like ESM2 take hundreds of thousands of GPU hours to pre-train on the vast protein universe. Resource requirements for scaling up pLMs prevent fundamental investigations into how optimal modeling choices might differ from those used in natural language. Here, we define a "cramming" challenge for pLMs and train performant models in 24 hours on a single GPU. By re-examining many aspects of pLM training, we are able to train a 67 million parameter model in a single day that achieves comparable performance on downstream protein fitness landscape inference tasks to ESM-3B, a model trained for over 15, 000x more GPU hours than ours. We open source our library1 for training and inference, LBSTER: Language models for Biological Sequence Transformation and Evolutionary Representation.

8

PAV-spotter: using signal cross-correlations to identify Presence/Absence Variation in target capture data

de Visser, M. C.; Ploeg, C. v. d.; Cvijanovic, M.; Vucic, T.; Theodoropoulos, A.; Wielstra, B.

2025-01-24 bioengineering 10.1101/2024.10.25.620064 medRxiv

Top 0.1%

18.1%

Show abstract

High throughput sequencing technologies have become essential in the fields of evolutionary biology and genomics. When dealing with non-model organisms or genomic gigantism, sequencing whole genomes is still relatively costly and therefore reduced-genome representations are frequently obtained, for instance by target capture approaches. While computational tools exist that can handle target capture data and identify small-scale variants such as single nucleotide polymorphisms and micro-indels, options to identify large scale structural variants are limited. To meet this need, we introduce PAV-spotter: a tool that can identify presence/absence variation (PAV) in target capture data. PAV-spotter conducts a signal cross-correlation calculation, in which the distribution of read counts per target between samples of different a priori defined classes - e.g. male versus female, or diseased versus healthy - are compared. We apply and test our methodology by studying Triturus newts: salamanders with gigantic genomes that currently lack an annotated reference genome. Triturus newts suffer from a hereditary disease that kills half their offspring during embryogenesis. We compare the target capture data of two different types of diseased embryos, characterized by unique deletions, with those of healthy embryos. Our findings show that PAV-spotter helps to expose such structural variants, even in the face of medium to low sequencing coverage levels, low sample sizes, and background noise due to mis-mapped reads. PAV-spotter can be used to study the structural variation underlying supergene systems in the absence of whole genome assemblies. The code, including further explanation, is available through the PAV-spotter GitHub repository: https://github.com/Wielstra-Lab/PAVspotter.

9

genomalicious: serving up a smorgasbord of R functions for population genomic analyses

Thia, J. A.; Riginos, C.

2019-07-05 bioinformatics 10.1101/667337 medRxiv

Top 0.1%

17.8%

Show abstract

Turning SNP data into biologically meaningful results requires considerable computational acrobatics, including importing, exporting, and manipulating data among different analytical packages and programming environments, and finding ways to visualise results for data exploration and presentation. I present GENOMALICIOUS, an R package designed to provide a selection of functions for population genomicists to simply, intuitively, and flexibly, guide SNP data through their analytical pipelines, within and outside R. At the core of the original GENOMALICIOUS workflow is the conversion of genomic variant data into a data.table object. This provides a useful way of storing large amounts of data in an intuitive format that can be easily manipulated using methods unique to this object class. Over time, GENOMALICIOUS has grown to cater to a range of analyses in population structure and demography, adaptive evolution, quantitative traits, and phylogenetics. Researchers using pooled allele frequencies, or individually sequenced genotypes, are sure to find functions that accommodate their tastes in GENOMALICIOUS. The simplicity and accessibility of pipelines in GENOMALICIOUS may also serve as a useful tool for teaching basic population genetics and genomics in an R environment. The source code and a series of tutorials for this package are freely available on GitHub

10

bletl - A Python package for integrating microbioreactors in the design-build-test-learn cycle

Osthege, M.; Tenhaef, N.; Zyla, R.; Mueller, C.; Hemmerich, J.; Wiechert, W.; Noack, S.; Oldiges, M.

2021-08-25 bioengineering 10.1101/2021.08.24.457462 medRxiv

Top 0.1%

15.6%

Show abstract

Microbioreactor (MBR) devices have emerged as powerful cultivation tools for tasks of microbial phenotyping and bioprocess characterization and provide a wealth of online process data in a highly parallelized manner. Such datasets are difficult to interpret in short time by manual workflows. In this study, we present the Python package bletl and show how it enables robust data analyses and the application of machine learning techniques without tedious data parsing and preprocessing. bletl reads raw result files from BioLector I, II and Pro devices to make all the contained information available to Python-based data analysis workflows. Together with standard tooling from the Python scientific computing ecosystem, interactive visualizations and spline-based derivative calculations can be performed. Additionally, we present a new method for unbiased quantification of time-variable specific growth rate [Formula] based on a novel method of unsupervised switchpoint detection with Student-t distributed random walks. With an adequate calibration model, this method enables practitioners to quantify time-variable growth rate with Bayesian uncertainty quantification and automatically detect switch-points that indicate relevant metabolic changes. Finally, we show how time series feature extraction enables the application of machine learning methods to MBR data, resulting in unsupervised phenotype characterization. As an example, t-distributed Stochastic Neighbor Embedding (t-SNE) is performed to visualize datasets comprising a variety of growth/DO/pH phenotypes. Practical ApplicationThe bletl package can be used to analyze microbioreactor datasets in both data analysis and autonomous experimentation workflows. Using the example of BioLector datasets, we show that loading such datasets into commonly used data structures with one line of Python code is a significant improvement over spreadsheet or hand-crafted scripting approaches. On top of established standard data structures, practitioners may continue with their favorite data analysis routines, or make use of the additional analysis functions that we specifically tailored to the analysis of microbioreactor time series. Particularly our function to fit cross-validated smoothing splines can be used for on-line signals from any microbioreactor system and has the potential to improve robustness and objectivity of many data analyses. Likewise, our random walk based [Formula] method for inferring growth rates under uncertainty, but also the time-series feature extraction may be applied to on-line data from other cultivation systems as well.

11

A new framework for analyzing mass transport in cortical brain tissue at <10 nm resolution

Elbert, D. L.; Esguerra, D. P.

2025-04-08 bioengineering 10.1101/2025.04.02.646802 medRxiv

Top 0.1%

15.2%

Show abstract

Transmission electron microscopy of brain tissue yields high resolution maps of the spatial organization of cells in the central nervous system. Automated segmentation identifies distinct biological cells and the resulting segmented images are easily converted into surface meshes. The surface meshes are structural models of cell surfaces, providing a framework for modeling of mass transport within the interstitial fluid of brain tissue that surrounds the cells. Our goal is to model the production and clearance of proteins implicated in the development of Alzheimers Disease. This work introduces a new custom computational framework to allow massive parallelization of the solution of the mathematical equations of mass transfer. The diffusion equation was solved directly on the surface meshes of multiple biological cells in parallel, with exchange of mass across co-localized faces at the end of each time step. Mass transfer across the interfaces of multiple analysis volumes was incorporated using a semi-implicit approach. To demonstrate the capabilities of the framework, unsteady mass transfer along cell surfaces was modeled in an array of eight 4 x 4 x 4 m analysis volumes containing 2175 biological cells at 8 nm resolution, consisting of 313 million face elements. Areas of enhanced and hindered diffusive transport were identified, suggesting structural motifs that may contribute to the development of insoluble plaques. The tortuosity and fractional anisotropy were consistent with Diffusion Tensor Imaging (DTI) measurements in cortical tissue. This new framework allows modeling of diffusive mass transport while preserving anatomical details at <10 nm resolution. Author summaryA framework was developed to model mass transport around thousands of biological cells in parallel. This may be useful to identify structural features in cortical tissue that are more prone to the development of amyloid plaques.

12

A heat method to interpolate z-stacked cell images that preserves interfaces without explicit identification of object boundaries

Elbert, D. L.

2025-04-17 bioengineering 10.1101/2025.04.11.648268 medRxiv

Top 0.1%

15.1%

Show abstract

A multitude of methods have been developed to morph one image into another. In scientific and medical imaging, it is common to have lower resolution in one imaging direction. To improve the quality of surface meshes generated from these images, morphing is used to produce intermediate images to achieve equal resolution in all directions. The most common method is based on interpolating signed distance functions that describe the shapes of objects in the images. Most methods rely on identification and recording of object boundaries. The situation becomes more complex in the case of interpolating images containing multiple objects of interest wherein the morphing method must not introduce gaps or overlaps of objects in the intermediate images. The goal of the present work was to develop an interpolation method for biological cells in transmission electron microscopy images meeting the constraints listed above. A heat diffusion method was developed that produces steady state isotherms in a 3D model of the intervening space between two images. The isotherms are then used to produce paths from pixels that are unique to a biological cell in one frame towards the overlap region. Instead of identifying boundaries, only membership in the overlap region is tested. For each pixel that terminates paths, the path lengths of all members are normalized to the maximum path length. The path lengths are then used to produce intermediate frames. The method succeeds in avoiding the introduction of gaps or overlap between neighboring biological cells without the need for boundary identification.

13

Scalable graph analysis tools for the connectomics community

Matelsky, J. K.; Johnson, E. C.; Wester, B.; Gray-Roncal, W. R.

2022-06-02 neuroscience 10.1101/2022.06.01.494307 medRxiv

Top 0.1%

15.1%

Show abstract

Neuroscientists now have the opportunity to analyze synaptic resolution connectomes that are larger than the memory on single consumer workstations. As dataset size and tissue diversity have grown, there is increasing interest in conducting comparative connectomics research, including rapidly querying and searching for recurring patterns of connectivity across brain regions and species. There is also a demand for algorithm reuse -- applying methods developed for one dataset to another volume. A key technological hurdle is enabling researchers to efficiently and effectively query these diverse datasets, especially as the raw image volumes grow beyond terabyte sizes. Existing community tools can perform such queries and analysis on smaller scale datasets, which can fit locally in memory, but the path to scaling remains unclear. Existing solutions such as neuPrint or FlyBrainLab enable these queries for specific datasets, but there remains a need to generalize algorithms and standards across datasets. To overcome this challenge, we present a software framework for comparative connectomics and graph discovery to make connectomes easy to analyze, even when larger-than-RAM, and even when stored in disparate datastores. This software suite includes visualization tools, a web portal, a connectivity and annotation query engine, and the ability to interface with a variety of data sources and community tools from the neuroscience community. These tools include MossDB (an immutable datastore for metadata and rich annotations); Grand (for prototyping larger-than-RAM graphs); GrandIso-Cloud (for querying existing graphs that exceed the capabilities of a single work-station); and Motif Studio (for enabling the public to query across connectomes). These tools interface with existing frameworks such as neuPrint, graph databases such as Neo4j, and standard data analysis tools such as Pandas or NetworkX. Together, these tools enable tool and algorithm reuse, standardization, and neuroscience discovery.

14

hyve, a compositional visualisation engine for brain imaging data

Ciric, R.; Xu, A.; Poldrack, R. A.

2024-04-21 bioinformatics 10.1101/2024.04.18.590179 medRxiv

Top 0.1%

15.0%

Show abstract

Visualisations facilitate the interpretation of geometrically structured data and results. However, heterogeneous geometries--such as volumes, surfaces, and networks--have traditionally mandated different software approaches. We introduce hyve, a Python library that uses a compositional functional framework to enable parametric implementation of custom visualisations for different brain geometries. Under this framework, users compose a reusable visualisation protocol from geometric primitives for representing data geometries, input primitives for common data formats and research objectives, and output primitives for producing interactive displays or configurable snapshots. hyve also writes documentation for user-constructed protocols, automates serial production of multiple visualisations, and includes an API for semantically organising an editable multi-panel figure. Through the seamless composition of input, output, and geometric primitives, hyve supports creating visualisations for a range of neuroimaging research objectives.

15

MiDAS - Meaningful Immunogenetic Data at Scale

Migdal, M.; Ruan, D. F.; Forrest, W. F.; Horowitz, A.; Hammer, C.

2021-01-14 bioinformatics 10.1101/2021.01.12.425276 medRxiv

Top 0.1%

15.0%

Show abstract

Human immunogenetic variation in the form of HLA and KIR types has been shown to be strongly associated with a multitude of immune-related phenotypes. We present MiDAS, an R package enabling statistical association analysis and using immunogenetic data transformation functions for HLA amino acid fine mapping, analysis of HLA evolutionary divergence as well as HLA-KIR interactions. MiDAS closes the gap between inference of immunogenetic variation and its efficient utilization to make meaningful discoveries.

16

DigitalPedon: A Novel Digital Twin Framework for Soil Profile Monitoring and Global Soil Data Interoperability

Youssef, A.; Badreldin, N.

2026-05-08 bioengineering 10.64898/2026.05.05.722891 medRxiv

Top 0.1%

13.0%

Show abstract

The Digital Pedon (DP) is an open-source Python framework that represents a soil profile as a continuously updated digital twin, bridging three persistent gaps in soil science: disconnected models and observations, cross-database interoperability, and the inference gap between raw sensor signals and agronomically meaningful variables. Integrating real-time sensor streams, model-based solver chains (Model-Zoo), GLOSIS-compliant ontology mapping, and a novel LLM agentic interface layer enabling natural language soil queries, the DP supports applications spanning precision agriculture, digital soil mapping, and environmental sustainability assessment. Four proof-of-concept experiments confirm automatic profile initialisation fidelity, solver chain consistency, ontology compliance, and user-defined solver extensibility.

17

Ethoscopy & Ethoscope-lab: a framework for behavioural analysis to lower entrance barrier and aid reproducibility

Blackhurst, L.; Gilestro, G. F.

2022-11-29 neuroscience 10.1101/2022.11.28.517675 medRxiv

Top 0.1%

12.7%

Show abstract

SummaryHigh-throughput analysis of behaviour is a pivotal instrument in modern neuroscience, allowing researchers to combine modern genetics breakthrough to unbiased, objective, reproducible experimental approaches. To this extent, we recently created an open-source hardware platform (ethoscope (Geissmann et al., 2017)) that allows for inexpensive, accessible, high-throughput analysis of behaviour in Drosophila or other animal models. Here we equip ethoscopes with a Python framework for data analysis, ethoscopy, designed to be a user-friendly yet powerful platform, meeting the requirements of researchers with limited coding expertise as well as experienced data scientists. Ethoscopy is best consumed in a prebaked Jupyter-based docker container, ethoscope-lab, to improve accessibility and to encourage the use of notebooks as a natural platform to share post-publication data analysis. Availability and implementationEthoscopy is a Python package available on GitHub and PyPi. Ethoscope-lab is a docker container available on DockerHub. A landing page aggregating all the code and documentation is available at https://lab.gilest.ro/ethoscopy.

18

CellScape: Protein structure visualization with vector graphics cartoons

Silvestre-Ryan, J.; Fletcher, D. A.; Holmes, I.

2022-06-16 bioinformatics 10.1101/2022.06.14.495869 medRxiv

Top 0.1%

12.6%

Show abstract

MotivationIllustrative renderings of proteins are useful aids for scientific communication and education. Nevertheless, few software packages exist to automate the generation of these visualizations. ResultsWe introduce CellScape, a tool designed to generate 2D molecular cartoons from atomic coordinates and combine them into larger cellular scenes. These illustrations can outline protein regions in different levels of detail. Unlike most molecular visualization tools which use raster image formats, these illustrations are represented as vector graphics, making them easily editable and composable with other graphics. Availability and ImplementationCellScape is implemented in Python 3 and freely available at https://github.com/jordisr/cellscape. It can be run as a command-line tool or interactively in a Jupyter notebook. Contactjordisr@berkeley.edu

19

Integrating Patient Metadata and Genetic Pathogen Data: Advancing Pandemic Preparedness with a Multi-Parametric Simulator

Bonjean, M.; Ambroise, J.; Connolly, M.; Hayes, J.; Hurel, J.; sentis, A.; Orchard, F.; Gala, J.-L.

2023-08-23 bioengineering 10.1101/2023.08.22.554132 medRxiv

Top 0.1%

12.5%

Show abstract

Training and practice are needed to handle an unusual crisis quickly, safely, and effectively. Functional and table-top exercises simulate anticipated CBRNe (Chemical, Biological, Radiological, Nuclear, and Explosive) and public health crises with complex scenarios based on realistic epidemiological, clinical, and biological data from affected populations. For this reason, the use of anonymized databases, such as those from ECDC or NCBI, are necessary to run meaningful exercises. Creating a training scenario requires connecting different datasets that characterise the population groups exposed to the simulated event. This involves interconnecting laboratory, epidemiological, and clinical data, alongside demographic information. The sharing and connection of data among EU member states currently face shortcomings and insufficiencies due to a variety of factors including variations in data collection methods, standardisation practices, legal frameworks, privacy, and security regulations, as well as resource and infrastructure disparities. During the H2020 project PANDEM-2 (Pandemic Preparedness and Response), we developed a multi-parametric training tool to artificially link together laboratory data and metadata. We used SARS-CoV-2 and ECDC and NCBI open-access databases to enhance pandemic preparedness. We developed a comprehensive training procedure that encompasses guidelines, scenarios, and answers, all designed to assist users in effectively utilising the simulator. Our tool empowers training managers and trainees to enhance existing datasets by generating additional variables through data-driven or random simulations. Furthermore, it facilitates the augmentation of a specific variables proportion within a given set, allowing for the customization of scenarios to achieve desired outcomes. Our multi-parameter simulation tool is contained in the R package Pandem2simulator, available at https://github.com/maous1/Pandem2simulator. A Shiny application, developed to make the tool easy to use, is available at https://uclouvain-ctma.Shinyapps.io/Multi-parametricSimulator/. The tool runs in seconds despite using large data sets. In conclusion, this multi-parametric training tool can simulate any crisis scenario, improving pandemic and CBRN preparedness and response. The simulator serves as a platform to develop methodology and graphical representations of future database-connected applications.

20

golgi: open-source software for automated nerve model generation and recruitment simulation

Lung, D.; Jia, Y.; Moro, A.; Fachino, M.; Haberbusch, M.

2026-07-13 bioengineering 10.64898/2026.07.10.737846 medRxiv

Top 0.1%

12.2%

Show abstract

golgi is an open-source platform that takes a peripheral nerve from image to stimulated fiber population through a single graphical interface, with an equivalent scriptable Python API and command-line interface for batch and high-performance use. It integrates promptable image segmentation, automated multi-region tetrahedral meshing, anisotropic finite-element solution of the extracellular field with an explicit perineurium contact impedance, generation of realistic fiber populations and their three-dimensional trajectories, and biophysical activation thresholds through interchangeable backends-- NEURON (via PyFibers) and a GPU-accelerated surrogate (AxonML). Every study exports as an integrity-hashed bundle whose image-to-recruitment provenance is verifiable byte-for-byte. golgi lowers the barrier to in-silico peripheral nerve stimulation modeling for experimentalists and clinicians, using a fully open finite-element stack with no commercial dependencies.